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Abstract. The purpose of this article is to introduce a new iterative 
algorithm with properties resembling real life bipartite graphs. The al- 
gorithm enables us to generate wide range of random bigraphs, which 
features are determined by a set of parameters. We adapt the advances of 
last decade in unipartite complex networks modeling to the bigraph set- 
ting. This data structure can be observed in several situations. However, 
only a few datasets are freely available to test the algorithms (e.g. com- 
munity detection, influential nodes identification, information retrieval) 
which operate on such data. Therefore, artificial datasets are needed to 
enhance development and testing of the algorithms. We are particularly 
interested in applying the generator to the analysis of recommender sys- 
tems. Therefore, we focus on two characteristics that, besides simple 
statistics, are in our opinion responsible for the performance of neigh- 
borhood based collaborative filtering algorithms. The features are node 
degree distribution and local clustering coefficient. 

Keywords: complex networks, random graphs, bipartite graphs, rec- 
ommender systems, affiliation networks 

1 Introduction 

The analysis of large networks is driven by the desire to understand and model 
as diverse phenomena as the spread of infection, social communities creation, 
protein interactions or website importance assessment [1]. The interest of re- 
search community in complex networks was fueled by an empirical evidence 
which proved that some properties of real-life graphs are unachievable for classic 
random models. Moreover, the similar properties are common to networks ob- 
served in various fields. Several statistics describing networks can be measured. 
However, node degree distribution and mean clustering coefficient are two mea- 
sures of a great importance. They are correlated for example with such macro 
features as an average length of a path between two nodes, the network's re- 
silience to an attach or the pace of spread of innovations. It turns out that in 
diverse real- life networks: 
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— node degree distribution is heavy-tailed 

— mean clustering coefficient is bounded away from zero 

In the classic theory of random graphs developed by two Hungarian math- 
ematicians Paul Erdos and Alfred Renyi [2 the asymptotic node degree distri- 
bution is Poisson. Also the value of clustering coefficient, which measures the 
probability that two nodes sharing a friend are connected differs from empirical 
results and tends to zero as a number of nodes grows. 

The seminal paper of Barabasi and Albert [3 describes the driving forces 
which are responsible for the heavy-tailed node degree distributions. The prop- 
erty can be attributed to both: the growth and the preferential attachment mech- 
anism. Moreover, none of the two results in the desired distribution on its own. 
Kumar and collaborators [4 proposed to substitute the preferential attachment 
mechanism with random selection of a neighboring node, which also leads to the 
heavy-tailed distribution. Liu [5 described how a mixture of preferential and 
random attachment enables us to generate networks with weakened heavy-tail. 
Vazquez [6 proposed a random graph generative procedure which results in 
networks with positive values of the clustering coefficient. The combined trans- 
lation of the four results onto the ground of bigraphs comprises the frame of our 
algorithm. 

Recently a few random bipartite graph generating algorithms have been in- 
troduced (0,[8], [9], [10]). However, none of them enables to generate growing 
networks with varying distributions and clustering coefficient bounded away from 
zero. 

Our contribution comprises four main results: 

1. definition and formal justification of new local clustering coefficient dedicated 
for bigraphs - bipartite local clustering coefficient (BLCC) 

2. introduction of bouncing mechanism responsible for the growth of BLCC 

3. description and analysis of new versatile bigraph generator 

4. identification of a relationship between network properties of bigraphs and 
the properties responsible for the complexity of recommender systems 

The rest of the article is organized as follows. In Section 2 we formalize node 
degree distributions, local clustering coefficient and introduce BLCC. In Section 

3 we outline the motivation for our research, which is based on the equivalence 
of bipartite graphs and user-item matrices in the recommender systems. The 
fourth section contains a description of our algorithm. In Section 5 we present 
the results of numerical simulations. The last sixth section is dedicated for the 
concluding remarks. Advanced mathematical transformations are described in 
details in two appendices. 
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2 Background 

A graph is an ordered pair G = (V^E) comprising a set of vertices V and a set of 
edges E C {V xV}. A bipartite network is a graph G = (UUl^E) which vertices 
can be labeled by two types U and /. The difference with a classic unipartite 
graph is the fact that V consists of two disjoint sets V = {/7U/, /7n/ = 0} 
and edges exist only between nodes of different types E C {U x /}. We analyze 
undirected graphs. 



2.1 Node degree 

A degree of a node stands for the number of direct (first) neighbors of the 
node and is equal to the number of node's edges. The probability density func- 
tion (pdf) of node degree distributions in real-life datasets is usually skewed 
(Fig. [2|. If the tail decays slowly we can observe the power-law distribution 
pdfpL{x) = ax~^. The tail vanishes quickly in the exponential distribution 
pdf EX {x) = Xe~^^. It is convenient to visualize the two distributions on a log- 
log scale. From the fact that \og{pdfpL{x)) = —k\og{x) -\-log{a) follows that the 
power-law distribution is shaped in a straight line on a log- log chart. This dis- 
tribution is called scale-free because pdfpL{cx) = a{cx)~^ = ac~^pdfpL{x). The 
distributions observed in real networks can not be generated by classic random 
graphs. The graphs studied by Erdos give the Poisson distribution. The three 
types of distributions are drawn in Fig. [l] 




■ Poisson 

■ Power-law 
□ Exponential 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

Node degree 

Fig. 1. Three degree distributions with the same average. The Poisson distribu- 
tion is characteristic for classic random graphs. The exponential and the power- 
law distributions are more common in real datasets. Both of them are skewed. 
However, the tail of the power-law distribution decays slower. 
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2.2 Local clustering coefficient 

Local clustering coefficient is used to measure the probability that if two nodes 
share a neighbor than they are also connected. It is computed for each node 
and an average over all nodes indicates the level of network's transitivity. Let's 
denote by Cj the number of connected pairs among the direct neighbors of node 
j and by kj the degree of node j. The local clustering coefficient {LCC) is given 
by: 

The value of LCC is zero for any node in a bipartite graph. Therefore, we 
introduce a new coefficient dedicated to measuring transitivity in bigraphs. Bi- 
partite local clustering coefficient {BLCC) of node j takes values of one minus 
the proportion of node's second neighbors to the potential number of the second 
neighbors of the node. The value of BLCC calculated for node j is given by: 

BLCC, = 1 (2) 

where |A^2(j)| stands for the number of the second neighbors of node j, Ni{j) is 
a set of the first neighbors of node j. 

In order to justify the correlation between LCC and BLCC^ we consider 
the values of the two coefficients in case of a unipartite graph. We denote by 
/(c) in Eq. (|3| the value of LCC calculated for a random node with c pairs of 
connected neighbors. We use g{c) in Eq. Q to assess the value of BLCC in case of 
the same node. Except of c pairs we follow the tree like structure assumption. We 

14 ) and 



substitute ki with (i.e. the expected degree of a neighboring nodi ^ 



{k) _ 
observe that on average \Ni{j)\ = {k). The logic of deriving \N2{j)\ is presented 
in Fig. [3| 

f( ) = = 2c 

^^'^ {k){{k)-l) {k)^-{k) 



{ky~^}~'^^ 2c 



ic) = l- ^VtI^T^^ = .... n. (4) 



(P) - (k) 



From the fact that the variance of any distribution is nonnegative and it can 
be decomposed as = (/c^) — (/c)^, we assert that g{c)/ f{c) is constant and not 
larger than one. 

We also considered a different definition of the number of potential second 
neighbors in Eq. [2] Within the local tree-like structure setting [15 it can be 

approximated by {u) (^^^ — 1^ Even though on average such definition gives 

positive fractions (Table [T]), a value of BLCC calculated for one node can be 
negative and therefore we stay with the definition of BLCC as it is in Eq. |2] 

^ The formula for an average degree of a neighboring node is derivated in appendix A. 
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TAG - RESOURCE graph (BibSonomy) 




MOVIE - ACTOR graph 




USER - GROUP graph (CiteULike) 



► user 
■ group 



Fig. 2. The node degree distributions of three bipartite graphs. The straight hne 
of points (on a LOG-LOG scale) in all three datasets envisions the power-law 
feature of the datasets. In case of BibSonomy [11 (upper chart) and IMDB [12] 
(middle chart) graphs, one modality tends towards exponential distribution. In 
case of CiteULike flT (lower chart) dataset both modalities are shaped in a 
straight line. 



6 Szymon Chojnacki and Mieczyslaw Klopotek 



mean number of the 2"^ 




Fig. 3. In order to compute the BLCC for a unipartite graph we need to assess 
the potential number of the second neighbors of a given node. A random node 
has {k) neighbors (in the figure (k) = 4). There are c connections among the 

neighbors on average (c = 2). Each neighbor has on average edges. Each 
edge points to a second neighbor of the considered node or to the node {{k) 
edges) or to the first neighbor (2c edges). We assume that there are no two 
different edges pointing to the same second neighbor. 



basic statistics second neighbors 





users 


items 


edges 


real 


theoretic 


real 
theoretic 


CEO [16] 


26 


15 


98 


21.8 


22.0 


0.99 


CiteULike [13] 


5 208 


2 336 


7 196 


14.2 


23.9 


0.59 


BibSonomy 


3 617 


93 756 


253 366 


500.4 


6 579.2 


0.08 


YouTube [l7] 


94 238 


30 087 


293 360 


1 269.6 


2 101.3 


0.60 


IMDB [12] 


383 640 


127 823 


1 470 404 


78.4 


211.4 


0.37 


Flickr |17] 


395 979 


103 631 


8 545 307 


1 217.4 


52 704.9 


0.02 


LiveJournal |17| 


3 201 203 


7 489 073 


112 307 385 


785 194.2 


1 521 273.4 


0.52 


Orkut fTT] 


2 783 196 


8 730 857 


327 037 487 


334 863.6 


2 294 114.8 


0.15 



Table 1. An average number of the second neighbors in eight real-hfe datasets 
is smaher than approximated by the Newman's asymptotic formula (theoretic 
value). The most significant shrinking is observed in the Flickr dataset. The 
shrinking is observed in both relatively small and very large datasets. 
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3 Recommender systems 

Recommender systems are an important component of the Intelhgent Web. The 
systems make information retrieval easier and push users from typing queries 
towards clicking at suggested links. We experience real-life recommender sys- 
tems when browsing for books, movies or music. The engines are an essential 
part of such websites as Amazon, MovieLens or Last.fm. The interest of research 
community in the systems was fueled by the Netflix movie recommendation com- 
petition [18]. During the challenge the state-of-art systems in terms of accuracy 
were developed. 

However, it has been shown recently during the ECML Discovery Challenge 
2009 [19] that the most accurate recommender systems fail to meet real-life 
constraints. It is not an easy task to update trained models when new items 
or users enter the evaluation. The problem is usually referred to as the Cold 
Start problem. These observations constitute the motivation for our research. 
We believe that there exists a need for algorithms that can generate random 
recommendation matrices (or equivalently bipartite graphs) . We are particularly 
interested in the neighborhood-based techniques. These methods are the best 
suited for the dynamically changing scenarios, but the latency of creating a 
recommendation depends significantly on the structure of underlying dataset 
(compare Fig. [4|. Moreover, because of embedding iterative mechanism in our 
generator, it can be used to simulate the Cold Start cases. 



Users 



Items 



{u) - the first moment of 
the user degree distribution 



{u^) - the second moment of 
the user degree distribution 



an average 
number of the 
second neighbors 
of a random user 




(v) - the first moment of 
the item degree distribution 



(y^) - the second moment of 
the item degree distribution 



<">(^-i)(f-i) 

an average number of the 
third neighbors of a 
random user 



in) 

an average 
number of the first 
neighbors of a 
random user 



Fig. 4. In recommender systems based on the neighborhood principle the rec- 
ommended items are selected from the items of the users that have rated at least 
one common item with an analyzed user. 
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4 Our algorithm 

Our algorithm consists of three steps: (1) new node creation, (2) edge attachment 
type selection and (3) running bouncing mechanism. The procedure requires 
specifying eight parameters: 

m - the number of initial loose edges with a user and an item at the ends 

T - the number of iterations 

p - the probability that a new node is a user 

(1 — ^) is the probability that a new node is an item 
u - the number of edges created by each new user 
V - the number of edges created by each new item 
a - the probability that a new user's edge is being connected to 

an item with preferential attachment 
^ - the probability that a new item's edge is being connected to 

a user with preferential attachment 
b - the fraction of preferentially attached edges 

that where created via a bouncing mechanism 
Steps (1) and (2) are explained in Sec. 4.1 and analyzed in Sec. 4.2. In Sec. 4.3. 
step (3) is discussed. 



4.1 Basic model 

In the basic model we utilize first seven parameters. The bouncing mechanism 
is applied in the full model as an additional third step. 



Initialize (nn=2) 



Users Items 



/- 



I I 



\ / \ 
\ • ^ N— • / 



Draw modality 



I Add a user | 



; • I ; . I 



I Add an item | 
• \ • \ 



Choose each edge's 
attachment type 



Random attachment 



Preferential attachment 



o 
-o 
o 
o 



Fig. 5. The bipartite random graph generator is initialized with a set of m pairs 
of users and items. During each iteration two steps are performed. In the first 
step the type of new node is determined. In the second step a decision is made 
on the level of each node's edge whether to draw its ending with preferential 
attachment or randomly. In the preferential attachment variant the probability 
that a node is drawn is proportional to its degree. 
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The basic model is based on an iterative repetition of two steps (Fig. |5|. 
Step 1 If a random number is greater then p create a new user with u loose 
edges, otherwise create a new item with v loose edges. 

Step 2 For each edge decide whether to join it to a node of the second modality 
randomly or with preferential attachment. The probability of selection preferen- 
tial attachment is a for new user and (3 for new item. 



4.2 Formal analysis 

One can see that after t iterations the bigraph consists of\U{t)\ = 2m-\-pt users, 
\I{t)\ = 2m-\-{l—p)t items, and \E{t)\ = 4m-\-t{pu-\-{l—p)v) edges. Let's denote 
by T] an average number of edges created during one iteration r] = {pu-\-{l —p)v). 
After relatively many iterations (t » m) we can neglect m. In the presented 
model, an average user degree is: 

\E(t) I _ 4m + t{pu + (1 - p)v) _ T] 
\U{t)\ ~ 2rnTpt ~ p' 

analogously an average item degrees is: 

\m\ ^ V 
\m\ (i-p)' 

the values are time invariant, but depend on both u and v. 

In the following deduction we look from user modality perspective. However, 
the computations can be altered to the opposite item modality easily. In order 
to derive asymptotic node degree distribution in our model we need to spec- 
ify the probability that a user node j with degree kj gets connected to a new 
item. The quantity is usually represented as 11 {kj) within the complex networks 
community. If nodes are selected randomly than: 

nrandom{kj) = ^^^=-. 

In case of random attachment n{kj) does not depend on kj. If nodes are 
selected with accordance to the preferential attachment rule than: 

J-J-preferentiaiykj) = ^^^^^ ~ ^* 

Contrary to the random attachment scenario, the probability of node's selec- 
tion is linearly proportional to its current degree. The probability of drawing a 
node with degree kj is the degree divided by the number of edges. We can verify 
that by summing the values of 77 over all user nodes we get one 11 j = 1. 
In our model the decision whether to draw a user for an item with random or 
preferential attachment depends on /3, hence the combined formula is: 

il(..)=/.l + (l-/.)| (5) 
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The equation ([5| enables us to describe the pace of growth of nodes ah with 
degree ki as 

^ = {i-p)vn{h). (6) 

We assume in the above equation that time interval between iterations is 
very small and that all nodes with a given degree grow in the same way. We 
show in the appendix that 

One can verify that for /3 = we get power-law distribution. If /3 ^ 1, we 
can utilize the fact that lim^^oo (l + n) = in order to obtain exponential 
distribution. The above result is consistent with [3 . When we put /3 = 0, p = 0.5 
and u = V we have power-law distribution with the scaling exponent equal to 3. 



4.3 Full model 

We have shown recently that node degree distributions of both modalities can 
be responsible for BLCC in some networks, but in others there exist additional 
shrinking forces responsible for high values of BLCC [20]. Therefore we introduce 
the bouncing mechanism (Fig.[6|, which is based on surfing the web technique [6J. 
The mechanism enables us to rise BLCC, but can only by applied to the edges 
that are to be selected with preferential attachment. This can be attributed 
to the fact that the probability that a random walk is finished in a node is 
proportional to its degree [21 . Bouncing is performed in three micro steps: (1) 
a random node is drawn from the nodes that are already joined with the new 
node, (2) a random neighbor of the drawn node is chosen, (3) a random neighbor 
of the neighbor is selected for joining with the new node. 
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1) A new user is 
created 



2) An attachment type 
is drawn for eacli edge 



3) Number of 
bounced nodes is set 



4) Bouncing is 
performed 



Users 



/ \ 

I I 

\ I 

\ I 



Items 



O 
O 

o 
o 

•. .. 



-u-(l-p) 



► 1 

^ J- u-(l-p) -I 




Fig. 6. For each edge of a new node, that is to be connected with an existing 
node with accordance to the preferential attachment mechanism, a decision is 
made whether to create it via a bouncing mechanism. In case of attaching new 
user node, u new edges are created. On average u • a edges' endings are to be 
drawn preferentially and u • a - h oi them are to be obtained via bouncing from 
the nodes that are already selected. 



Algorithm 1: An iteration of the bipartite graph generator 

if RANDO < p then 

// p - the probability that a new node is a user 

for /c ^ 1 to It do 

/ / u - the number of edges created by anew user 

if RANDO < OL then 

/ / Oi - the probability that the new user's item is 
drawn preferentially 

if RANDO < h then 

/ / h - the probability that new preferential node 
was chosen by bouncing 

Selected Item ^ BounceFromRandom(Templtems) ; 
else 

Selected Item ^ DrawItemPref erentiallyO ; 
Templtems ^ Selected Item ; 

else 

Selected Item ^ DrawItemRandomly ; 
Templtems ^ Selected Item; 

Users ^ Users U NewUser; 
Edges Edges U{Templtems x NewUser} ; 
else 

L Process analogously with new item node 
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5 Numerical results 

The results of the numerical experiments are divided into three subsections. In 
the first part we shortly present a Java applet developed in our Lab to play 
with various parameters of the generator. In the second part we show which 
parameters impinge on the values of node degree distributions and BLCC. In 
the last section we show how the number of potentially similar users and the 
number of their items can be determined by various levels of the generators 
parameters. 



5.1 Graphical analysis 

The applet presented in Figure [7| can be accessed online in http : / / www . ipipan . ' 
[eu/~sch/soft ware /applet .html, All parameters (except of the initial number 
of pairs) can be changed during graph generation. The distributions of BLCC 
and node degrees are being updated online for both modalities. Alse the average 
number of potentially similar users and their items is visualized at a chart. By 
an expression similar user we understand all users that have rated at least one 
item in common with the selected user. 



Applet Viewer; tnc 
Applet 


j.BigraphVijualiier.clajs 




pCoiTtrollers- ^ 
Initialize 


ace: Cliques 
slow |10 [▼ 

sER 

itv:0.5 1 

1 1 

rob.: 0.5 1 


Bouncing parameter: 0.5 




Start 

U 

new node probabii 
rPref. attachment n 


ITEM 

new node probability: 0.5 1 

1 g 1 

Pref. attachment prob.: 0.5 1 










■ rn , 

i-edges at startup: 

EE 


. .rn , 

-edges at startup: < 

EB 


^1 




1 IM 






USERS: 


24, ITEMS: 26, EDGES: 100 

Average Number of similar USERS 


Average Number of ITEMS of similar USERS 


Average Clustering CoefTicient (BLCC) 


Applet St 


Value 
o -S 


A ■ 




J" 

> '> 
0,1 ^ 

0.0 ■ ■ 




c 


10 20 30 40 50 60 70 80 E 
Iteration 


20 40 60 80 
Iteration 


20 40 60 eo 
Iteration 



Fig. 7. A bigraph generated after t = 30 iterations. The values of all probabilities 
were set to 0.5, each new node creates three new edges u = v = 3, initial number 
of pairs m = 10. 
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5.2 Social network properties 

We consider node degree distributions of both modahties and the values of BLCC 
as the network properties of the generated graphs. Node degree distributions are 
controlled by two parameters: a and /3. We show in Figure [S] that if one parameter 
tends to one, the shape of appropriate modality becomes power-law. Low values 
output exponential distribution. Moreover, we do not observe any correlation 
between the distributions of both modalities. 




Fig. 8. Left Panel: blue circles indicate that the random attachment of users' 
edges (i.e. items) results in the exponential distribution of item degrees. Red 
triangles in both panels show that as a ^ 1 the distribution becomes power- 
law. Experiments run with (m = 50, T = 10 000, p = 0.5, u = v = 1 ^ j3 = 0.5). 



The values of BLCC {bipartite local clustering coefficient) can be controlled 
by the extend of the bouncing mechanism (Figure [9]). 

If we neglect the bouncing mechanism {h = 0) BLCC is controlled by node 
degree distributions (Figure 10). 

There exist several other network properties that can be tunned by the pa- 
rameters in our model. Such as an average distance between randomly selected 
pairs of nodes, the diameter of a bigraph, resilience to attack, spread of innova- 
tions or creation of the largest connected component. We omit the analysis of 
these features as they do not seem to have direct impact on the performance of 
the recommender systems. 



5.3 Neighborhood size properties 

The number of operations that a neighborhood recommender system has to 
perform is related to the number of similar users and the number of their items. 
We recommend a new item to analyzed user from the items of the users that are 
similar to her/him. In Figure 11 we show two intuitive results: 



— the size of the neighborhood grows with the size of a graph 
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Fig. 9. The growth of the bouncing parameter b results in higher values of BLCC 
(bipartite local clustering coefficient). If no nodes are connected with accordance 
to the preferential attachment mechanism a = /3 = 0, the values of b do not 
influence BLCC. Experiments run with (m = 50, T = 10 000, p = 0.5, u = v = 
7). 



BLCC for USER nodes 



BLCC for ITEM nodes 




0.0 0.2 0.4 0.6 0.8 1.0 

at)ha 



Fig. 10. BLCC growths as more edges are connected with preferential attach- 
ment mechanism. The phenomenon is observed even when the bouncing param- 
eter is zero. Experiments run with (m = 50, T = 10 000, p = 0.5, u = v = 7^ 
b = 0). 
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— the size of the neighborhood grows with the density of a graph (fixed number 
of nodes and growing number of edges) 



The growth of the neighborhood is relatively sharper in case of the number 
of items. It is interesting that the number of similar users becomes stable earlier 
for sparser graphs (3 and 6 edges at startup) than for denser graphs (12 and 24 
edges at startup). 




Fig. 11. An average number of similar users (having at least one common item 
with a considered user) follows the growth in a graph's size. The positive relation 
is stronger in case of the number of the items of the similar users. The density 
of a graph (modeled by the number of startup edges) has even stronger impact 
on the size of the neighborhood than the size of a graph. Experiments run with 
(m = 50, T = 10 000, p = 0.5, a = (3 = 0.5, b = 0). 



A result of potentially great importance is drawn in Figure [T2j It turns out 
that the impact of the shapes of node degree distributions (controlled by pa- 
rameters a and (3) on the sizes of the neighborhoods is not monotonic. It turns 
out that the more exponential like than power-law like the distribution of users' 
degrees the smaller number of similar users is observed. In all other cases the 
opposite force is identified. 



The result presented in Figure 13 is somewhat disappointing. The shrinking 
impact of the bouncing mechanism on the sizes of the neighborhoods is hardly 
observed. The effect of bouncing is too gentle compared to the level at which we 
are placed by the power-law distribution. Also random changes among various 
networks are stronger at the level than the shrinking forces. This drawback 
reflects the fact that in growing random graphs positive clustering coefficient is 
correlated with power-law node degree distribution and we are unable to generate 
graphs with both the exponential node degree distribution and high value of the 
clustering. 
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Average number of SIMILAR USERS 



Average number of ITEMS of SIMILAR USERS 





Fig. 12. The shape of node degree distributions of both modahties has opposite 
influence on the average number of similar users. The more power-law like item 
degree distribution, the more neighbors can be observed. The more heavy-tailed 
the distribution of user nodes the stronger shrinking of the neighborhood is 
obtained. The arrows indicate the direction of growth. Experiments run with 
(m = 50, T = 10 000, p = 0.5, u = v = 7,b = 0). 



Average number of SIMILAR USERS 




-©-alpha, beta = 1.00 
-B-alpha, beta =0.75 
^alpha, beta =0.50 
-e-alpha, beta = 0.25 
— alpha, beta =0.00 



Average number of ITEMS of SIMILAR USERS 











□ 


-e-alpha, beta = 1.00 
-B-alpha, beta = 0.75 
^alpha, beta = 0.50 
-e-alpha, beta = 0.25 
— alpha, beta = 0.00 







Fig. 13. The growth of the bouncing parameter b has slight negative impact of 
the size of both neighborhoods. However, the number of similar users and their 
items is determined mostly by the shapes of node degree distributions. 
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6 Conclusion 

We have presented a new random graph generative algorithm dedicated to mod- 
eling performance of recommender systems. We have shown that the parameters 
of the algorithms influence not only pure network properties of created bigraphs, 
but also the properties related to the performance of neighborhood based collab- 
orative filtering systems. Besides of the above features, the procedure enables us 
to output bigraphs of different sizes, densities and the proportions of the number 
of users to the number of items. We plan to compare how various features of 
bigraphs impinge on time and memory requirements of existing systems. Con- 
sequently, better understand the algorithms, their implementations and finally 
improve both of them. 
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A Degree of a neighboring node 



In this appendix we derive the expected degree of a neighboring node in a random 
graph (Figure 14). Let's denote by (k) and (/c^) the first and the second moments 
of the node degree distribution of graph G = (F, E). 



GRAPH 




Fig. 14. The expected degree of a neighbor of randomly selected node is larger 
than an average node degree. 



If we pick a random node from a graph then its expected number of neighbors 
(degree) is {k). Each of {k) edges points at a different vertex. The probability 
that a random edge is connected to a node is proportional to the total number 
of edges that are connected with the node. The probability that a random edge 
is connected to a node i with degree ki is equal to ^ , . Hence, the expected 

degree of a neighboring node is: 



The analysis is based on an assumption that there exist no correlation between 
the degrees of two neighboring nodes. 
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We can show that this value is not smaller than (k) i.e. an expected degree 
of a random node. Let us recall the Cauchy- Schwartz inequality: 

[t-^y^'^[t-^[tyf)- (9) 

By putting Xi = 1 for i = 1, . . . , n, we get: 

ity] <4ty^^ (10) 



Ki=l / \i=l 



and 



ELi^ < (ELi^f)/^ ^ fill 

B Node degree distribution 

We follow continuum approach |3J to derive user node degree distribution. The 
item node degree distribution can be obtained analogously. The calculations 
consist of three steps. Firstly, let's solve Eq. (|6|. 



^ - (1 - p)vn{t 



dt 

= {l-p)v 
= {l-p)v 



pt rjt 
1 f pTj + p{l - I3)kj 



t \ prj 



which yields 



/"—'—•- P——dk^= f-dt. (12) 

J {l-p)v I37j + p{l- I3)kj ' J t ^ ' 

Taking into account an initial condition kj{tj) = where tj is the time of 
creating user j, and the fact that / dx = Mn |ax + 6| + C we obtain 

^ ([In (/?7? + p{l - p)kj)] - [In (/?r, + p{l - /?)«)]) = [\nt] - [In tj] , 



{l-p)vp{l-p) 

(13) 

both sides of which can be used as exponents of e, giving 

/37y+p(l-/3)/c,^™V^ _ ft \ ^^^^ 



after reorganizing, we have 
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kjit) 



1 



(/?r,+p(l-/?)u) 



(l-p)(l-(3)i 



p7j . (15) 



The probability that kj is smaller then a given k is: 



(l-p}(l-P)v 



<P{kj{t) <k} = <P 



pil - p) 



< k 



(16) 



and after reorganizing 



^{kj{t) <k} = ^<tj >t 



(17) 



We can assume that nodes are added at equal time intervals until the current 
iteration t. The probability the iteration of adding node j is larger than some 
K < t equals 1 — ^{tj < K) = 1 — Kj. Substituting this assumption into Eq. 
(17), we obtain 



^{kjit) <k} = 1- ^ Itj <t 



= 1 



(3r]^p{l - (3)k\ 
/3r]^p{l - p)u 



We can obtain probability density function of random variable k by differ- 
entiating its cumulative distribution function P{k) = d^{kj(t) < k}/dk, as a 
result we have 



p{k) 



that is: 



il-p)il-p)v 



- /3r? + p(l-/3)fc V^-^)u-^)- ' 



. /3r^ + p{l-/3)k \T^-^^-' 



(19) 
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